arXiv: 1507.04187vl [math.FA] 15 Jul 2015 


DEALING WITH MOMENT MEASURES 
VIA ENTROPY AND OPTIMAL TRANSPORT 

FILIPPO SANTAMBROGIO 


Abstract. A recent paper by Cordero-Erausquin and Klartag provides a characterization of 
the measures p on R d which can be expressed as the moment measures of suitable convex 
functions u, i.e. are of the form (Vit)#e -U for miR^aRU {+00} an d finds the corresponding 
u by a variational method in the class of convex functions. Here we propose a purely optimal- 
transport-based method to retrieve the same result. The variational problem becomes the 
minimization of an entropy and a transport cost among densities p and the optimizer p turns 
out to be e -u . This requires to develop some estimates and some semicontinuity results for the 
corresponding functionals which are natural in optimal transport. The notion of displacement 
convexity plays a crucial role in the characterization and uniqueness of the minimizers. 


1 . Introduction 

We consider in this paper the notion of moment measure of a convex function, which comes 
from functional analysis and convex geometry. Given a convex function u : M. d —>• M U {+00}, 
we define its moment measure as 

p := (' Vu)#p , where dp = e~ u ^dx. 

The connection of this notion with the theory of optimal transport is straightforward from 
the fact that, by Brenier’s Theorem, the map X 7 u will be the optimal transport map for the 
quadratic cost c(x,y ) = \\x — y | 2 from p to p. 

In a recent paper, Cordero-Erausquin and Klartag ([ 9 ]) studied the conditions for a measure p 
to be the moment measure of a convex function. First, they identified that an extra requirement 
has to be imposed to the function u in order the problem to be meaningful. The main difficulty 
arises in case u is infinite out of a proper convex set K C In this case one needs to require 
some continuity properties of u on dK . Without this condition, every measure with finite first 
moment can be the moment of a function u, which is in general discontinuous on d{u < +00}. 
Also, there is a strong non-uniqueness of u. On the contrary, if one restricts to convex functions u 
that are continuous a.e. on d{u < +00} (those functions are called essentially continuous, 

then there is a clear characterization: a measure p is a moment measure if and only if it has 
finite first moment, its barycenter is 0 , and it is not supported on a hyperplane. Moreover, the 
function u is uniquely determined by p up to space translations. 

In [ 9 ], the authors first prove that these conditions on p are necessary, due to summability 
properties of log-concave densities, and they prove that they are sufficient to build u as the 
solution of a certain minimization problem. 

Here we want to reprove the same existence result with a different method, replacing functional 
inequalities techniques with ideas from optimal transport. This aspect seems to be absent from 
[ 9 ] even if it is not difficult to translate most of the ideas and techniques of Cordero-Erasquin and 
Klartag into their optimal transport counterparts. The result is an alternative language, that 
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is likely to be appreciated by people knowing optimal transport theory, while the community of 
functional inequalities could legitimately prefer the original one. The question of which approach 
will the colleagues working with both optimal transport and functional inequality prefer is an 
open and unpredictable issue... 

The main idea justifying this approach is the following: many variational problems of the 
form 

minjif V$(p,fi) + J f(p(x)) dxj 

have been studied in recent years, for different purposes (tinre-discretization of gradient flows, 
urba planning... see for instance [7, 10, 12] and Chapter 7 in [13]). The optimality condition 
of this problem reads, roughly speaking, as 

<f> + f'(p ) = const, 

where </> is the Kantorovich potential in transport from p to p for the cost c{x,y ) = ^\x — y | 2 
(notice that the convex function appearing in Brenier’s Theorem is given by u(x) = \x 2 — 4>(x): 
if T = Vu is the optimal transport, —V0 is the optimal dispacement, i.e. T(x) = x — V0(x)). 

In the case f(t) = tint, the condition above implies that p is proportional to e~^. In order 
to obtain the condition which is required we need to correct the above minimization, so that 
we get u instead of cj>. In order to do so, one needs to change the sign and insert a ^|x| 2 term, 
which leads to 

min{ — \w$(p,n) + i J |x| 2 dp(x) + £(p) j , 

where £(p) denotes the entropy of p, defined as £(p) := f pin pdx for p C d , £{p) = Too 
for p non absolutely continuous. When p has finite second moment and we minimize the above 
functional among measures p supported on a given compact set K it is easy to check that a 
minimizer exist and that we have p = e~ u , where ( S7u)#p = p. Yet, it is highly possible that u 
is discontinuous at the boundary of K. Indeed, at least in the case where also p is compactly 
supported, one sees that the function u must be Lipschitz (since its gradient is bounded), which 
means that it is bounded, and hence p is bounded from below by a positive constant. Yet, p = 0 
and u = +oo outside K and u is not essentially continuous. This is a confirmation that every 
p € Vi(M. d ) is a moment measure, if we accept convex functions u which are not essentially 
continous. 

The interesting case is the one where we minimize among measures p € Vi(M. d ), without 
restricting their support to a compact domain. In this case the existence of an optimal p is 
not evident (indeed, the term W- 2 (p, p) is continuous for the weak convergence of probability 
measures on compact sets, but only l.s.c. on unbounded sets, and here it is accompanied by the 
negative sign, which makes it u.s.c., while we want to minimize). Also, the lower semicontinuity 
of the entropy term £(p) is more delicate on unbounded sets. Here comes into play the assump¬ 
tions on p, which will allow to provide a bound on the first moment of a minimizing sequence 
p n . Then, we can prove that p is log-concave and that a precise representative of p must vanish 
T-L d ~ l - a.e. on the boundary of its support, which proves that u is essentially continous. In order 
to handle the case of measures p, p ^ 7*2 (K d ), we will profit of the fact that the second moment 
part of \W 2 {p, p) and ^ f |x| 2 dp(x) cancel each other, and that everything is well-defined for 
p, p E T’i(R d ) if we transform the minimal transport problem for the cost c(x,y) = ^\x — y | 2 
into the maximal transport problem for c{x, y) = x ■ y. 
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Hence, we study the minimization problem 

(P) min<j£0) + T(p,fi) : p G Pi(M d )} 

where 

Tip-, p) = sup < (x-y)d'y(x,y) : 'yeU(p^) 


As we announced above, we prove that this problem has a log-concave solution of the form 
p = e - “, that u is essentially continuous and p is the moment measure of u. We also prove 
uniqueness up to translations of the minimizer and that the condition p = (VixWp, where 
dp = e~ w , is sufficient to minimize. This characterizes u. Uniqueness and sufficient conditions 
will be based on the notion of displacement convexity in the space endowed with the 

distance W 2 . It is useful to notice that the displacement convexity of the entropy exactly 
corresponds to the Prekopa inequality, which expresses more or less the same fact, but at the 
level of convex functions. 

Structure of the paper After this introduction, Section 2 presents the main well-known 
tools from optimal transport theory (Wasserstein distances, geodesic interpolation...). The 
esitmates and semicontinuity of the entropy term that we provide at the end of the section are 
also well-known, but presented in alternative fashion. Section 3 is devoted to the transport cost 
pt-> T(p, p) and to its estimates and semicontinuity in a Vi(M. d ) framework. Section 4 is the core 
of the paper, and presents the variational problem that we need to solve in order to find the log- 
concave measure p = e~ u that we aim at. In particular, we prove that u is convex and essentially 
continuous. In Section 5 we show that the condition (Vu)#e - “ = p is actually equivalent to 
the fact that e~ u minimizes our functional, using the notion of displacement convexity. Finally, 
in Section 6 we compare our approach to the one in [9], explaining why they are equivalent and 
how to pass from one to the other. 

Acknowledgments The authors would like to thank Guillaume Carlier and Dario Cordero- 
Erausquin for interesting discussions about this problem. These discussions have been made 
possible by the workshop “New Trends in Optimal Transport” organized at the Hausdorff Center 
for Mathematics in March 2015. 


2. Few words on optimal transport, entropy and technical tools 

We recall here the main notions and notations that we will use throughout the paper. We 
refer to [13] (Chapters 1, 5 and 7) and to [1, 16] for more details and complete proofs. 

Given two probability measures p, v € V(M. d ) we consider the set of transport plans 


H (p, v) = {7 € V(R d x W l ) : (vr a: ) # 7 = p, (tt ?/ ) # 7 = v, } 

i.e. those probability measures on the product space having p and v as marginal measures. 
For a cost function c : X W l —>• [0, + 00 ] we consider the minimization problem 


mm 


cdy : 7 € II(//, v) 


which is called the Kantorvich optimal transport problem for the cost c from p to v. In particular, 
we consider the case c(x, y) = \\x — y\ 2 . In this case the above minimal value is finite whenever 
p, v E where := {p G 'P(M d ) : f \x\ 2 d p(x) < + 00 }. 
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For the above problem one can prove that the minimal value also equals the maximal value 
of a dual problem 


max < / cj) dy + / dv 


H x ) + ^(y) < \\x-y\ 2 

and that the optimal function </> may be used to construct an optimizer 7 . Indeed, the optimal 
cj) is locally lipschitz and semiconcave (in particular x eA \\x\ 2 — 4>(x) is convex) on spt(y) and 
differentiable y—a.e. if y <C C d \ one can define a map T : -A through T(x) = x — Vc/)(x) 

and this map satisfies T#y = v and 7 t '■= {id,T)#y (i.e. the image measure of y through the 
map x i-a (x, T(x))) belongs to II(/i, u) and is optimal in the above problem. Moreover, the map 
T is the gradient of the convex function u given by u(x) = ^ |a;| 2 — </>(x) and is called the optimal 
transport map (for the quadratic cost c(x, y) = \\x — y\ 2 ) from y to v. The fact that the optimal 
transport map T exists, is unique, and is the gradient of a convex function is known as Brenier 
Theorem (see [5]). 

The same could be obtained if one withdrew from the cost \\x — y\ 2 the parts ^|x | 2 and \\y\ 2 
which only depend on one variable each (hence, their integral against 7 only depends on its 
marginals). In this case we are interested in a transport maximization problem 


limX \J ^ d7< ' X ’ ^ : 7 € ^ 

and the dual problem would become 

min {/ udy + J vdv : u{x) + v(y) > x ■ y\ 


In this problem it is quite clear that any pair (u,v) can be replaced with (it, it*) where u*(y) := 
sup x x ■ y — u{x) is the Legendre transform of u and is the smalles function compatible with 
u in the constraint u(x) + v(y) > x ■ y. Then, it is easy to see by the primal-dual optimality 
conditions that the optimal 7 and the optimal u satisfy 


spt( 7 ) C {(x, y) : it(x) + it*(y) = x ■ y} = {(x, y) : ye d it(x)}, 


which shows that 7 is concentrated on the graph of a map T (which is one-valued y— a.e., 
provided y <C C d ), given by T = Vii. 

The value of the minimization problem with the quadratic cost may also be used to define a 
quantity, called Wasserstein distance, over ^(M^): 


W 2 (y,v) 


min 


x — y\ 2 d'y : yell (y,v) 


This quantity may be proven to be a distance over M. d ). Moreover, when restricted to the 
probabilities supported on a given compact set, i.e. to V(K) with K C compact, it metrizes 
the weak-* convergence of probability measures. The space V- 2 {^ d ) endowed with the distance 
IT 2 is called Wasserstein space of order 2 and denoted in this paper by W 2 (M rf ). 

The geodesics in this space play an important role in the theory of optimal transport. If 
y,v E V 2 {^ d ) and y <C C d , we define pt ■= ((1 — t)id + tT)#y, where T is the optimal trnasport 
from y to za This curve pt happens to be a constant speed geodesic for the distance W 2 
connecting y to v (in case neither y nor v are absolutely continuous, it is possible to produce a 
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geodesic by taking p t := ( 77)^7 where n t (x, y ) = (1 — t)x+ty and 7 is optimal in the Kantorovich 
problem, which gives the same result if 7 = 7 t). 

Once we know the geodesics in W 2 (M d ), one can wonder which functionals F : V 2 ^ d ) —> R 
are geodesically convex, i.e. convex along constant speed geodesics. This notion, applied to the 
case of the Wasserstein spaces, is also called displacement convexity and has been introduced by 
McCann in [11], It is very useful both to provide uniqueness results for variational problems and 
to provide sufficient optimality conditions. A related notion is that of convexity on generalized 
geodesics , which corresponds to t eA F(p t ) being convex every time that we take a triplet p, P 01 Ph 
the optimal maps To from p to po and T\ from p to p\, and take pt = ((1 — t)To + tT{)#p. The 
curve pt is not in general the geodesic connecting po to p\. However, the conditions to guarantee 
displacement convexity and convexity on generalized geodesics are often very similar and most 
of the useful functionals which are actually used satisfy both notions, and we prenset this notion 
here only for the sake of completeness. 

We are in particular interested in the entropy functional £, defined as follows 


£(p) := 


f p(x) ln(p(x)) dx 
+00 


if p <C C d , 
otherwise. 


This functional is displacement convex as it satisfies the assumption required in [ 11 ] (it is also 
convex on generalized geodesics, but we will not use it here). We also compute its derivative 
along a geodesic pt . Suppose pt = (T t )#p, where T t = (1 — t)id + tT. From a simple change of 
variable, we have 

( rr w «_ P{X) 

det((l — t)I + tDT(x)) 

(this formula, where DT is the Jacobian matrix of T, is valid provided Tt is countably Lipschitz 
and injective, which is the case whenever T is an optimal transport). Hence we have 


£(p t ) = j ln(p t (x)) dp t (x) = J \n p t (T t (x)) dp(x) = £{p) - j ln(det((l — t)I + tDT{x))) dp{x). 
Hence, we have 

j t [£(pt)] lt=0 = -f V ■ (T — id) dp. 

Here the divergence is to be understood in the a.e. sense, as T is countably Lipschitz. If we 
use T = Vu with v convex, this divergence is equal to A ac v — d, where A ac is the absolutely 
continuous part of the distributional Laplacian of v, which is a measure. 

We also need to underline two other properties of £, in particular lower bounds and semicon¬ 
tinuity. We stress that both these conditions are easy when we look at measures in V(K) with 
\K\ < 00 , but become trickier on the whole space, because p\np is not positive. 

The last property that we need to recall is the semicontinuity of £ w.r.t. the weak-* convergenc 
eof probability measures, under the extra condition of a bound on the first moment f |x| d p(x). 
The semicontinuity of functionals of the form 

P^ J f dA ( x ) 

is standard for convex and superlinear / (which is the case for f(t) = tln(t)) whenever the 
reference measure A (which is the Lebesgue measure here) is finite (see for instance [4] or Chapter 
7 in [13]). If / is positive it it easy to get the same result by taking the supremum of functionals 



6 


F. SANTAMBROGIO 


restricted to compact subsets, but this is not possible here. Yet, we can notice that for p <C C d 
we have 

m = / (*,) InW,)) + e'-W- - *.W»>) dx + / d(x)/>(x) dx - / e^Mx 

for any function /i such that e h ( x )- 1 £ L 1 (M a! ) and h G T^p). If we suppose p G we can 

take h = — \/N- Then we can write £1 = £1 + £2 + £3, where 


£i(p) = J ^p(x) ln(p(x)) + e h ( x ) 1 — p(x)h(x)^ dx 
£ 2 {p) = J p(x)h(x)dx, 

£3 (p) = -JeW-'dx. 


Notice that the integrand in E\ is positive (indeed, for every a G M+ and b G M we have 
a ln(a) + e 6_1 > a ■ b). This allows to write £ > £2 + £ 3 , and hence we have 


£{p) > — J p{x)^/\x\dx — J e 1 d.x 


>-C- 



x\dp(x), 


where we used J dp(x) = 1 in the last inequality (Holder inequality). This gives a bound from 
below of £(p) in terms of the square root of the first moment of p. 

Then, we also express £\ as a supremum over compact sets, using the positivity of the inte¬ 
grand: 

£\{p) = sup f (p(x)ln(p(x))+e h ^~ 1 —p(x)h(x)\dx. 

KcR d , K compact J ' 

Now, we observe that for every sequence p n —*■ p with f |x|d p n (x) < C and £{p n ) < C we 
also have weak convergence p n —*• p in L 1 , and / fi>{x)dp n (x) —>• f ip(x)dp(x) for every sublinear 
function if (i.e. satisfying lim| x |_ > . 00 ip(x)/\x\ = 0, which is true for if = h). Hence, the term E\ 
is l.s.c., £ 2 is continuous and £3 is constant and finite for this type of convergence. Globally we 
can resume these facts in the following proposition. 

Proposition 2.1. (1) The functional £ : Pi(M d ) AlU {+ 00 } is well-defined and satisfies 

£(p) >~C~ {j \x\dp{xj) l/2 . 

(2) For every sequence p n —*■ p with f \x\dp n (x) < C and£(p n ) < C we have liminf n £(p n ) > 

£{p). 

(3) When restricted to the functional £ is geodesically convex in W 2 and strictly 

convex on every geodesic t ha p t = ((1 —t)id+tT)#p where the map T is not a translation. 

(4) If p t = ((1 — t)id + tT)#p, then the derivative at t = 0 of t i-A £{pt) is given by 
-fV- (T-id)dp. 
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3. The maximal correlation functional 


For p,p £ Vi(R d ), with f ydp(y) = 0, we define the following quantity: 


T(p,p) := sup{/(*.y)d 7 : 7 £H(*,,)} 
u dp + f u* dp : u:R d - 


= inf 


We notice that, if p, p £ V 2 $&. d ), then we also have 


. U {Too} convex and l.s.c. 


= - 


x 


dp(x) + ^ / \y\ 2 dp(y) -^W${p,p). 


This functional T is a transport cost, but we also observe that it is the maximal correlation 
between p and p, in the sense 

T(p,y) = sup {Epr-y] : x~ P ,Y~p}. 

For this reason, T will be called maximal correlation functional. 

We are interestend in the following properties. 


Proposition 3.1. (1) For every p £ Pi(M d ), we have T(p, p) £ [0, +oo]. 

(2) If p and p are one obtained from the other by translation, then T{p,p) = T(p,p). 

(3) If f xdp n (x) = 0 and p n — 1 p, then liminf n T(p n ,/x) > T(p,p). 

(4) For every p £ Pi(M d ) with f ydp(y) = 0 there exists a sequence p n of compactly sup¬ 
ported measures with p n — 1 p and f ydp(y) = 0 such that for every p £ Vi(M. d ) we have 

T{p,p n ) -> T(p,p). 

(5) For every p £ V\ (M fi ) t/iere exists a sequence p n of compactly supported probabilities such 
that p n p, £(p n ) —>• F(p) and such that for every p £ Pi(M d ) with f ydp(y) = 0 we 
have T{p n ,p) ->• T(p,p). 

Proof. In order to prove (1), just take 7 = p <8> p £ n(p, p). We get 


TO,/i) > J (x ■ y) d(p ® p) = (^J xdp(x)^j ■ ydp{y) \ = 0. 


To prove (2), notice that every 7 £ II(/5, p) can be expressed as the translation of a 7 £ II( / o, p), 
in the sense f <f(x, y) d 7 (x, y) = f cj>(x + v, y) 7 (x, y), where v is the vector translating p into p. 
Applying this fact to (f>(x,y) = x ■ y, we get T(p,p) = T(p,p) + fv ■ y dp(y) = T(p,p). 

We now prove (3). We restrict to the case T(p n ,p) < C otherwise the statement is straight¬ 
forward. We first take optimal plans 7n such that T(p n ,p) = f(x ■ y)dj n (x, y). From the 
tightness assumption on p n , we infer that 7n are also tight. Hence, we can extract a converging 
subsequence 7n —*• 7 £ H(p,p). If we set T n := spt( 7n ), we have a sequence of closed sets in 
WL d x M. d . We can extact a further subsequence locally Hausdorff converging to a closed set T. 
From the cyclical monotonicity of each T n , we infer that T is also cyclically monotone. Hence, 7 
is also optimal, since its support is contained in T, which implies that it is cyclically monotone, a 
condition which is sufficient to guarantee optimality. Hence we have T(p, p) = / (x ■ y) d 7 (x, y). 

We just need to prove lirn inf n J(x ■ y) d 7n (x, y) > f (x ■ y) d 7 (x, y). If x ■ y were a bounded 
continuous function, we would have equality. The problem is that it is not bounded. Yet, we 
can prove that it is bounded from below on (J n F n , which is enough. 
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Indeed, take (x,y), (x',y') € T n . From cyclical monotonicity we can write 

x ■ y + x 1 ■ y' > x ■ y' + x' ■ y. 

If we integrate the above inequality w.r.t. dy n (x',y') we get 

x ■ y + T{p n ,p) > 0. 

Indeed, / x ■ yd^ n (x',y') = x ■ y, f x' ■ y' drj n {x', y') = T(p n ,p), J x ■ yd^ n (x',y') = 0 and 
f x ■ y d^ n {x ', y') = 0. This proves x ■ y > —C on (J n F n and allows to prove (3). 

To prove (4), we take for instance p n := p\—B(0,n) + p(B(0,n) c )5 Vn , where 

v * : =imi¥)L,nP Mv) - 

In this case we have f u* dp n < f u* dp for every convex function u*, which gives T(p, p n ) < 
T(p, p). Combining this inequality with the semicontinuity result of (3) we get T(p,p n ) —>■ 

T(p,p). 

The same construction does not work for (5), as we want to guarantee convergence of entropies. 
In this case, if £(p) < +oo, we need to produce a sequence p n of absolutely continuous measures. 
We can take p n := (pLB(0,n))//i(5(0,n)). In this case we can check explicitely that £{p n ) —» 
£{p) by dominated convergence. Moreover, for every convex function u we have f udp n —>• f udp 
(first we subtract a linear part to u, using p £ Pi(R d ), and then we are reduced to monotone 
convergence). Hence, along this sequence, the functional T{-,p) is u.s.c. as an infimum of 
continuous functional. But the semicontinuity result of (3) provides the continuity. □ 

We are also interested in the following estimate on T as a function of p in terms of the first 
moment of p. We first define the constant 

C ( /U ) := 2(1 inf {/l y ' e “^l d ^( y ) : e G • 

Note that the infimum in the definition of c(p) is actually a minimum (we minimize a function 
which is continuous in e and £, coercive w.r.t. £, and e lives in a compact set). For simplicity, 
we only state the estimate in the case where p is absolutely continuous. 

Proposition 3.2. If f xdp(x) = 0 and p <C C d , then we have T(p,p) > c(p) J|x|d p(x). In 
particular, T satisfies an inequality of the form T(p , p) > c f |x| dp(x) for c > 0 and for every p 
with f x d p(x) = 0, if and only if p is not supported on a hyperplane. 

Proof. Take p such that f xdp(x) = 0 and p <C C d and select a vector e € such that 

e ia f (x ■ e) + d p(x) is maximal. Set A + := {x : x ■ e > 0} and A~ := {x : x ■ e < 0}. By 
optimality conditions, this irnpies that v + := f A+ xdp(x) is oriented as e. From the barycenter 
condition on p and the fact that it does not charge the hyperplane {x : x ■ e = 0}, the vector 
v~ := f A _ xdp(x) is opposite to v + . Set := p(A ± ). We have m + + m~ = 1, again from 
p(M d \(H + UH-)) = 0. 

Let £ be a value such that p({x : x ■ e > £}) < m + and p({x : x ■ e < I}) < (if p does 
not give mass to the hyperplane {x ■ e = £} then both inequalities are equalities). Decompose p 
into the sum of two measures p± such that 

spt p + C {x ■ e > £}, spt p~ C {x ■ e < £}, ^ ± (M d ) = m ± . 
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Consider 7 obtained in the following way: distribute the mass of p\_A + onto that of p + via a 
tensor product, and do the same from pLH - onto p~, i.e. 


1 1 

7 = — ~(p\_A + ) <g>/i + + ^^(p\-A~) <S> p~ E II (p,p). 


m 


We have 


(x ■ y) d 7 = v' 1 ' 


m -1 


m 


ydp + (y) + v - — I y dp (y). 


We use that v + is oriented as e, and v = — v + , and we have 

v+ ■ —T f yd^ + (y) + v~ • — [ y dp~(y) = - [ y • ed p + (y) - — f y ■ edpTipy). 

m -1- J m J J m J 

Since p + /m + and p~ /m~ are both probability measures, we can subtractthe same constant i 
to the two integrals and get 


nV 


(: U-e-e ) d fi + {y) - 3 j (y-e-tjdp (y) = |u + | / \y ■ e - £\ d( 


m 


^T + (— 

m 


Then, we use rri^ < 1 in order to estimate this last result from below with |u + || f \y ■ e — l\ dp. 
In order to conclude, we just need to observe that 


/ |x| d p(x) < 2 d sup 
ee § d - 1 

The last part of the statement is easy: if p is not concentrated on a hyperplane, then c(p) > 0. 
If p is concentrated on a hyperplane H, then take an arbitrary measure p concentrated on the 
line L orthogonal to H and passing through the origin. One can choose it so that f xdp(x) = 0 
and f |a;| d p(x) > 0 , while T (p, p) = 0 . □ 


J(x ■ e) + dp(x) = 2 d\v + 


Finally, we also prove displacement convexity of T as a function of p. 

Proposition 3.3. Let po,p\ € be absolutely continuous measures, and let pt = ((1 —t)id+ 

tT)#p be the unique constant speed geodesic connecting them for the Wasserstein distance W‘ 2 - 
Then t\-¥ T ( pt , p) is convex on [0,1]. Its derivative at t = 0 is larger than f (: T{x ) — x) ■ y dy(x) 
where 7 is an optimal transport plan from po to p. 

Proof. Suppose for a while that p £ Then, it is well known that p i-» — \W^{p, p) is a — 1- 

displacement convex functional (see [1], Theorem 7.3.2). On the other hand, p ha- ^ J |x| 2 d p{x) 
is 1-convex, and the sum of the two, which gives T{p, p) — \ f \y\ 2 d p(y), is displacement convex. 

To obtain the proof for general p in one needs to approximate, and part (4) of 

Proposition 3.1 allows to do so. Hence, if pt is a geodesic for W 2 , the inequality T(pt,p n ) < 
(1 — t)T(po, p n ) + tT(pi,p n ) passes to the limit and implies the displacement convexity of 
P ^ T(p, p). 

For the last part of the statement, we just observe that 

T{p t ,p) > J {x-y)d((T t x id) # ^){x,y) = J T t {x) ■ yd^{x,y). 

Differentiating this last term we obtain the desired expression. □ 
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4. A VARIATIONAL PRINCIPLE FOR MOMENT MEASURES 


As we sketched in the introduction, we consider the following variational problem. We fix 
/i E V\ (M rf ) with f ydn(y) = 0 and not supported on a hyperplane, and we want to solve 

(P) min {£(p)+ T(p, p) : p € V 1 (M d )} . 

Theorem 4.1. The problem (P) admits a solution, which is unique up to translation. If p is a 
solution, and u : R —> MU{+oo} is a convex l.s.c. function such that T(p,p) = f u dp + f u* dp 
with u = +oo on {p = 0}, then p = ce~ u . 


Proof. To prove existence of a solution, we take a minimizing sequence p n . We can suppose that 
all p n have 0 as their barycenter as translating them does not change the value of the two parts 
of the functional. We use the inequality 

£{p)>-C-(j\x\dp{x) S j 


and the inequality T(p, p) > c J \x\ d p(x) that we proved in Proposition 3.2. This implies that the 
moment f \x\ d p n (x) must be bounded. In particular, this gives tightness of the sequence p n and 
we assume p n —*■ p. Also, we know that the entropy £(p) is l.s.c. for the weak convergence when 
the first moment is bounded, and the semicontinuity of T along sequences with f xdp n (x ) = 0 
was proven in Proposition 3.1. 

This proves that a minimizer exists. 

We first analyze the optimality conditions: if p is optimal and u is a convex function realizing 
the minimum in the definition of T (p, p ), then 


P ^ 


p In pdx + 


udp 


is minimal for p = p. By standard convex minimization arguments this implies that p also 
minimizes the linearized functional 


P 


p(ln p + 1) dx + 


u dp, 


which implies that p is concentrated on the set of points where In p+l+tt is minimal. This means 
that p > 0 on every point where u < +oo, and on these points we need to have In p = C — u, i.e. 
p = ce~ u . This same formula also holds on u = +oo, since we necessarily have p = 0 on those 
points. 

In particular, the optimal p is a log-concave probability density. This implies that all its 
moments are finite, and we have p E 

As for uniqueness, we suppose to have two minimizers po; Pi- We know that they must belong 
to P 2 (®l rf )- We use the displacement convexity of the entropy and of the 7~ term and observe 
that the entropy is strictly convex on the geodesic pt unless po and pi are obtained one from the 
other by translation. This gives uniqueness up to translation. □ 


Remark 4.1. We observe that (P) has no solutions if p is concentrated on a hyperplane. Sup¬ 
pose this hyperplane is {xd = 0} and take p n = 7 ^jC d \-{R n ) where R n = {x = (x\,... ,Xd) € 
: \xi\ < 1 fori = 1 1, \xd\ < n}. In this case we have £(p n ) = — ln(2n) and 

T{p n ,p) <VdJ\y\ d p(y). Hence £(p n ) +T{p n ,p) ~oo. 
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To prove the main result of the paper, we just need to prove that u is essentially continuous. 
This can be done in the following way. We recall that, given any solution p to problem (P), we 
can choose as a precise representative of p the one given by p = ce~ u , with u convex and l.s.c. 
(hence p is log-concave and u.s.c.). 

Theorem 4.2. Let p be the precise representative above of a solution of (P). Set Q = {u < + 00 }. 
Then p = 0 PL d ~ l -a.e. on dtt. 

Proof. Suppose on the contrary that p > 0 on a set of positive PL d ~ l measure on dQ. Since 
O is a convex set, we can use local coordinates and assume that this set is of the form A = 
G M x x' G B,x\ = h(x')}, where B C W 1 ^ 1 has positive Lebsegue measure and 
h is a convex real-valued function. In the same chart, D would be locally expressed as the set 
of points satisfying x\ > h(x'). Up to reducing the set A (and hence B), we can suppose that 
p takes values in [a, b], for 0 < a < b < + 00 . We also observe that p, as it is log-concave, 
is locally bounded. Define A e := {(xi,x f ) G R x R d_ 1 x' £ B,x 1 G [h(x'),h(x') + e]}. Since 
p(xi,x') is a continuous function of x\ G [h(x'),h(x') + s] [(because of its log-concavity and of 
the representative that we chose), we easily get f A p(x) dx = O(s). 

Now, we define a new density p £ as a competitor in (P). We define T : A e -A by T(xi,x') = 
{x\ — e, x') and we set 

p e = pL-(Af) + 1 -pUA £ ) + l -T # {pl_{A e )). 

By computing the density of p e we can check 

£(Pe) =S(p)~ p(A e ) In 2. 

In order to estimate T(p e ,p) we take the optimal function u (realizing T(p,p ) = f u dp + 
f u* dp) and we modify it into a function us defined as follows. First, take a convex, positive 
and superlinear function x '■ —>• M such that f x( x ) dp(x) < + 00 . Such a function exists 

because p G Then we fix 5 > 0 and we take u = (u* + Sx)*- We have 

u$(x) < u(x) for every ig!1 u§(T(x )) < u{x) + 5x* for every x G A e . 

We have 

T(pe,p)< J u s dp £ + J (u* + 6 x)dp <T{p,p) + ^p(A £ )5x* (|) + $ j X dp. 

The optimality of p compared to p £ provides 

p(A £ ) In 2 < )^p(A £ ) 8 x* (|) + 8 J %d/x. 

Now, use p{A £ ) > cq£ and choose 8 = ce for a small constant c such that c f x dp < - 7 C 0 In 2. 
This gives 

1 /-‘f 

-p(T £ )ln 2 <^p(A £ ) X *(c- 1 ), 


which is impossible as e —>• 0 . 


□ 
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5. Sufficient optimality conditions 


To complete the current study, it remains to prove that every log-concave density p = e~ u 
such that (' Vu)#p = p and u is an essentially continuous convex function {+ 00 } 

is necessariyl a minimizer of £{p) + T(p,p). This would explain that the variational principle 
of the previous section finds exactly all the desired functions u. 

As this result is not the main core of the paper, this section will be little more sketchy than 
the rest, and will use some results from [9]. Anyway, we claim that the main points of the proof 
are present in the paper. 

The main idea, already investigated in in [2, 3] for game theory purposes, is the fact that 
displacement convexity is sufficient to guarantee minimality when necessary conditions are sat¬ 
isfied. 

The result is the following. 

Proposition 5.1. Suppose that u : W l — >• MU{+oo} is an essentially continuous convex function, 
consider p = e~ u and suppose (Vti)^p = p. Then p £ Pi(M rf ) and it solves 

min \£{p)+T{p 1 p) : p € Pi(M d )} . 


Proof. Let us consider an arbitrary p £ with compact support and the geodesic pt = 

((1 — t)id + tT)#p, where T = X7v is the optimal transport from p to p. From the compact 
support assumption on p,we get that T is bounded. From the displacement convexity of £ and 
T(-,p), in order to prove £(p) +T(p,p) > £{p) + T(p,p) it is sufficient to prove 

(£(pt) + T(p t ,p)) ]t=0 > 0. 

The computation of the derivative is included in Propositions 2.1 and 3.3 and we have 


^(£(pt)+T'{pt, p))\ t =o > ~ J( Aac v ~ d)dp + j(Vv(x) 


x) ■ S7u(x)p(x) dx. 


First we use the inequality 


d 


j e~^ix> j x-Vu(x)e~^Ax 


valid for essentially continuous convex functions u. This is actually an equality, but anyway the 
inequality we need is proven in [9]. Hence 

d 


dt 


( £{pt ) + T{pt, p))\ t =o - ~ f (^ aCv ) dp + j Vv(x) ■ Vu(x)p(x) dx. 


We then use A ac v < Av (as v is convex and its distributional derivative is a positive measure) 
and we integrate by parts. We first do it on a ball B(0,R): 


— f (A ac v) dp + f Vv(x) ■ Vu{x)p{x)dx = — f Vv(x) ■ np{x)dTL d 1 (x). 
JB(0,R) Jb( 0,R) JdB{0,R) 

We want to pass to the limit as R —» oo. The secon integral may be handled using the fact that 
X7u(x)p(x) = — Vp(x) and Vp £ L 1 (this corresponds to p € Vi(R d )) and that T = Vv £ L°°. 
In the first integral, we use (Av)(B(0, R)) = f gB ( 0 R> Vr(x) ■ ndR d l (x) < CR^ 1 , together with 
the exponential decay of p = e~ u , so that p £ L 1 (At;). In the right-hand side, we use again the 
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exponential decay of p with the polynomial explosion of H d 1 (dB(0, R)) and the boundedness 
of Vv. 

Hence, we get 

d f 

— (£(pt) + T(pt,n)) i t=0 > lhn - / Vu(x) • np(x)&U d ~ l (x) = 0 . 

dt 1 R-+oo J dB (0,R) 

This proves the optimality of p when compared to compactly supported measures p. for a 
general p , we use part (5) of Proposition 3.1, which allows to approximate every p E V\(M. d ) 
with compactly supported measures p n so that £(p n ) + T(p n ,p) —>• £(p) + T(p,p). □ 


6. From this variational problem to the one studied in [9] 


In this last section we want to make some short comments on (P) and connect it to the 
problem studied in [9]. 

From the dual formulation of T, we may re-write our problem as a min-min problem 


mm 


I p(x)lnp(x)dx + j u(x)p{x) dx + I u*(y)dp(y) : p€V i(M d 


, u convex 


The approach of the present paper consists in considering u as a secondary variable: for every 
p we compute the minimum over possible u, which gives rise to the functional T{p,p). 

A different possible approach could consist in looking at p as the secondary variable, i.e. 
considering for every u the optimal p. It is easy to see that one can compute 


inf 


p(x)ln p(x)dx + J u(x)p(x)dx : p>0,Jp(x) 


dx = 1 


by a Lagrange multiplier approach, which means that we need to choose p so that In p(x) + u = 
const. Hence, p is proportional to e~ u . Let us write c = f e~ u ^dx. We have p = e~ u /c and we 
can compute 


f p(x) in p(x)dx + 


u{x)p{x) dx 


1 

c 


(e “(— u — Inc) + ue “) dx 


In c. 


This means that we consider the functional u i—— In (f e u dx), which is a concave functional 
of u, and we solve 


min 


u* dp — 


M /' 


l dx) 


u convex 


In the above minimization problem, the first term is convex, since u i->- u*(y) is convex, but 
the second is concave. 

Notice that this transformation of a convex-concave minimization problem (i.e., the mini¬ 
mization of the difference of two convex functions) into another convex-concave minimization 
is what is usually known as Toland duality (see [14, 15]). In particular, we also refer to [ 8 ] for 
the applications of this notion to the case of variational problems involving the term — H 7 ^, and 
their connections to variational problems under convexity constraints. 

What [9] does, is to consider the same problem in terms of u* instead of u: the first term 
becomes linear and the second, magically, convex (thanks to a clever application of a quantita¬ 
tive Prekopa inequality, which orresponds, as we said in the introduction, to the displacement 
convexity of the entropy). One should not be astonished that they obtain a convex problem: 
convexity in u* more or less corresponds to the displacement convexity of the functional that 
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we study here in terms of p (more precisely, convexity in Vu* corresponds to the convexity on 

generalized geodesics with base measure /j). 
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